7 research outputs found

    Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

    Get PDF
    In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.Comment: Submitted to Transactions on Pattern Analysis and Machine Intelligenc

    Using computer vision to gain health insights from social media

    No full text
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 69-71).In this thesis, I developed machine learning methods that would support tools for gaining health insights from social media images. On the one hand, I have helped create a dataset for food image segmentation and a segmentation network for this task, a task that is very relevant for understanding the nutritional content of food from images. I also explored the flip side of the issue, and helped design an interface that users can use to explore food recipes in machine learning-driven ways. On the other hand, I helped create a dataset and model for classifying types of disasters given images of natural or manmade disasters.by Aritro Biswas.M. Eng

    Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

    No full text
    In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M+ affords the ability to train high-capacity models on aligned, multimodal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M+ dataset and food and cooking in general. Code, data and models are publicly available.11.http://im2recipe.csail.mit.edu
    corecore